{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-pipelines/auto-ml-forecasting-pipelines.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<font color=\"red\" size=\"5\"><strong>!Important!</strong> </br>This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1h_automl_in_pipeline/automl-forecasting-in-pipeline)).</font>\n",
        "</br>\n",
        "</br>\n",
        "<font color=\"red\" size=\"5\">\n",
        "For examples illustrating how to build pipelines with components, please use the following links:</font>\n",
        "<ul>\n",
        "    <li><a href=\"https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1k_demand_forecasting_with_pipeline_components/automl-forecasting-demand-many-models-in-pipeline\">Many Models</a></li>\n",
        "    <li><a href=\"https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1k_demand_forecasting_with_pipeline_components/automl-forecasting-demand-hierarchical-timeseries-in-pipeline\">Hierarchical Time Series</a></li>\n",
        "    <li><a href=\"https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-distributed-tcn\">Distributed TCN</a></li>\n",
        "</ul>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Training and Inferencing AutoML Forecasting Model Using Pipelines"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "\n",
        "In this notebook, we demonstrate how to use piplines to train and inference on AutoML Forecasting model. Two pipelines will be created: one for training AutoML model, and the other is for inference on AutoML model. We'll also demonstrate how to schedule the inference pipeline so you can get inference results periodically (with refreshed test dataset). Make sure you have executed the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook. In this notebook you will learn how to:\n",
        "\n",
        "- Configure AutoML using AutoMLConfig for forecasting tasks using pipeline AutoMLSteps.\n",
        "- Create and register an AutoML model using AzureML pipeline.\n",
        "- Inference and schdelue the pipeline using registered model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import json\n",
        "import logging\n",
        "import os\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.38.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Accessing the Azure ML workspace requires authentication with Azure.\n",
        "\n",
        "The default authentication is interactive authentication using the default tenant. Executing the ws = Workspace.from_config() line in the cell below will prompt for authentication the first time that it is run.\n",
        "\n",
        "If you have multiple Azure tenants, you can specify the tenant by replacing the ws = Workspace.from_config() line in the cell below with the following:\n",
        "```\n",
        "from azureml.core.authentication import InteractiveLoginAuthentication\n",
        "auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
        "ws = Workspace.from_config(auth = auth)\n",
        "```\n",
        "If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the ws = Workspace.from_config() line in the cell below with the following:\n",
        "```\n",
        "from azureml.core.authentication import ServicePrincipalAuthentication\n",
        "auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
        "ws = Workspace.from_config(auth = auth)\n",
        "```\n",
        "For more details, see aka.ms/aml-notebook-auth"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "dstor = ws.get_default_datastore()\n",
        "\n",
        "# Choose a name for the run history container in the workspace.\n",
        "experiment_name = \"forecasting-pipeline\"\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Run History Name\"] = experiment_name\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute \n",
        "\n",
        "#### Create or Attach existing AmlCompute\n",
        "\n",
        "You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "amlcompute_cluster_name = \"forecast-step-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=4\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data\n",
        "You are now ready to load the historical orange juice sales data. For demonstration purposes, we extract sales time-series for just a few of the stores. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "time_column_name = \"WeekStarting\"\n",
        "train = pd.read_csv(\"oj-train.csv\", parse_dates=[time_column_name])\n",
        "\n",
        "train.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred.    \n",
        "\n",
        "The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we define the **time_series_id_column_names** - the columns whose values determine the boundaries between time-series: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "time_series_id_column_names = [\"Store\", \"Brand\"]\n",
        "nseries = train.groupby(time_series_id_column_names).ngroups\n",
        "print(\"Data contains {0} individual time-series.\".format(nseries))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Test Splitting\n",
        "We now split the data into a training and a testing set for later forecast prediction. The test set will contain the final 4 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the time series identifier columns."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "n_test_periods = 4\n",
        "\n",
        "test = pd.read_csv(\"oj-test.csv\", parse_dates=[time_column_name])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Upload data to datastore\n",
        "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the train and test data and create [tabular datasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training and testing. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    train, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_train_pipeline\"\n",
        ")\n",
        "\n",
        "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    test, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_test_pipeline\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Modeling\n",
        "\n",
        "For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
        "* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
        "* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
        "* Create features based on time series identifiers to enable fixed effects across different series\n",
        "* Create time-based features to assist in learning seasonal patterns\n",
        "* Encode categorical variables to numeric quantities\n",
        "\n",
        "In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please see the many-models notebook.\n",
        "\n",
        "You are almost ready to start an AutoML training job. First, we need to define the target column."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "target_column_name = \"Quantity\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Forecasting Parameters\n",
        "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
        "\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**time_column_name**|The name of your time column.|\n",
        "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
        "|**time_series_id_column_names**|The column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined, the data set is assumed to be one time series.|\n",
        "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information.\n",
        "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
        "\n",
        "forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=n_test_periods,\n",
        "    time_series_id_column_names=time_series_id_column_names,\n",
        "    freq=\"W-THU\",  # Set the forecast frequency to be weekly (start on each Thursday),\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    debug_log=\"automl_oj_sales_errors.log\",\n",
        "    primary_metric=\"normalized_mean_absolute_error\",\n",
        "    experiment_timeout_hours=0.25,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    verbosity=logging.INFO,\n",
        "    max_cores_per_iteration=-1,\n",
        "    forecasting_parameters=forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import PipelineData, TrainingOutput\n",
        "from azureml.pipeline.steps import AutoMLStep\n",
        "from azureml.pipeline.core import Pipeline, PipelineParameter\n",
        "from azureml.pipeline.steps import PythonScriptStep\n",
        "\n",
        "metrics_output_name = \"metrics_output\"\n",
        "best_model_output_name = \"best_model_output\"\n",
        "model_file_name = \"model_file\"\n",
        "metrics_data_name = \"metrics_data\"\n",
        "\n",
        "metrics_data = PipelineData(\n",
        "    name=metrics_data_name,\n",
        "    datastore=datastore,\n",
        "    pipeline_output_name=metrics_output_name,\n",
        "    training_output=TrainingOutput(type=\"Metrics\"),\n",
        ")\n",
        "model_data = PipelineData(\n",
        "    name=model_file_name,\n",
        "    datastore=datastore,\n",
        "    pipeline_output_name=best_model_output_name,\n",
        "    training_output=TrainingOutput(type=\"Model\"),\n",
        ")\n",
        "\n",
        "automl_step = AutoMLStep(\n",
        "    name=\"automl_module\",\n",
        "    automl_config=automl_config,\n",
        "    outputs=[metrics_data, model_data],\n",
        "    allow_reuse=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Register Model Step"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Run Configuration and Environment\n",
        "To have a pipeline step run, we first need an environment to run the jobs. The environment can be build using the following code."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n",
        "\n",
        "# create a new RunConfig object\n",
        "conda_run_config = RunConfiguration(framework=\"python\")\n",
        "\n",
        "# Set compute target to AmlCompute\n",
        "conda_run_config.target = compute_target\n",
        "\n",
        "conda_run_config.docker.use_docker = True\n",
        "\n",
        "cd = CondaDependencies.create(\n",
        "    pip_packages=[\n",
        "        \"azureml-sdk[automl]\",\n",
        "        \"applicationinsights\",\n",
        "        \"azureml-opendatasets\",\n",
        "        \"azureml-defaults\",\n",
        "    ],\n",
        "    conda_packages=[\"numpy==1.19.5\"],\n",
        "    pin_sdk_version=False,\n",
        ")\n",
        "conda_run_config.environment.python.conda_dependencies = cd\n",
        "\n",
        "print(\"run config is ready\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Step to register the model.\n",
        "The following code generates a step to register the model to the workspace from previous step. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# The model name with which to register the trained model in the workspace.\n",
        "model_name_str = \"ojmodel\"\n",
        "model_name = PipelineParameter(\"model_name\", default_value=model_name_str)\n",
        "\n",
        "\n",
        "register_model_step = PythonScriptStep(\n",
        "    script_name=\"register_model.py\",\n",
        "    name=\"register_model\",\n",
        "    source_directory=\"scripts\",\n",
        "    allow_reuse=False,\n",
        "    arguments=[\n",
        "        \"--model_name\",\n",
        "        model_name,\n",
        "        \"--model_path\",\n",
        "        model_data,\n",
        "        \"--ds_name\",\n",
        "        \"dominicks_OJ_train\",\n",
        "    ],\n",
        "    inputs=[model_data],\n",
        "    compute_target=compute_target,\n",
        "    runconfig=conda_run_config,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Build the Pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline = Pipeline(\n",
        "    description=\"training_pipeline\",\n",
        "    workspace=ws,\n",
        "    steps=[automl_step, register_model_step],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit Pipeline Run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline_run = experiment.submit(training_pipeline)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline_run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get metrics for each runs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "output_dir = \"train_output\"\n",
        "pipeline_output = training_pipeline_run.get_pipeline_output(\"metrics_output\")\n",
        "pipeline_output.download(output_dir)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "file_path = os.path.join(output_dir, pipeline_output.path_on_datastore)\n",
        "with open(file_path) as f:\n",
        "    metrics = json.load(f)\n",
        "for run_id, metrics in metrics.items():\n",
        "    print(\"{}: {}\".format(run_id, metrics[\"normalized_root_mean_squared_error\"][0]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Inference"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "There are several ways to do the inference, for here we will demonstrate how to use the registered model and pipeline to do the inference. (how to register a model https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get Inference Pipeline Environment\n",
        "To trigger an inference pipeline run, we first need a running environment for run that contains all the appropriate packages for the model unpickling. This environment can be either assess from the training run or using the `yml` file that comes with the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.train.automl.run import AutoMLRun\n",
        "\n",
        "for step in training_pipeline_run.get_steps():\n",
        "    if step.properties.get(\"StepType\") == \"AutoMLStep\":\n",
        "        automl_run = AutoMLRun(experiment, step.id)\n",
        "        break\n",
        "\n",
        "best_run = automl_run.get_best_child()\n",
        "inference_env = best_run.get_environment()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "After we have the environment for the inference, we could build run config based on this environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run_config = RunConfiguration()\n",
        "run_config.environment = inference_env"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Build and submit the inference pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The inference pipeline will create two different format of outputs, 1) a tabular dataset that contains the prediction and 2) an `OutputFileDatasetConfig` that can be used for the sequential pipeline steps."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data import OutputFileDatasetConfig\n",
        "\n",
        "output_data = OutputFileDatasetConfig(name=\"prediction_result\")\n",
        "\n",
        "output_ds_name = \"oj-output\"\n",
        "\n",
        "inference_step = PythonScriptStep(\n",
        "    name=\"infer-results\",\n",
        "    source_directory=\"scripts\",\n",
        "    script_name=\"infer.py\",\n",
        "    arguments=[\n",
        "        \"--model_name\",\n",
        "        model_name_str,\n",
        "        \"--ouput_dataset_name\",\n",
        "        output_ds_name,\n",
        "        \"--test_dataset_name\",\n",
        "        test_dataset.name,\n",
        "        \"--target_column_name\",\n",
        "        target_column_name,\n",
        "        \"--output_path\",\n",
        "        output_data,\n",
        "    ],\n",
        "    compute_target=compute_target,\n",
        "    allow_reuse=False,\n",
        "    runconfig=run_config,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "inference_pipeline = Pipeline(ws, [inference_step])\n",
        "inference_run = experiment.submit(inference_pipeline)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "inference_run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get the predicted data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Dataset\n",
        "\n",
        "inference_ds = Dataset.get_by_name(ws, output_ds_name)\n",
        "inference_df = inference_ds.to_pandas_dataframe()\n",
        "inference_df.tail(5)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Schedule Pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This section is about how to schedule a pipeline for periodically predictions. For more info about pipeline schedule and pipeline endpoint, please follow this [notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "inference_published_pipeline = inference_pipeline.publish(\n",
        "    name=\"OJ Inference Test\", description=\"OJ Inference Test\"\n",
        ")\n",
        "print(\"Newly published pipeline id: {}\".format(inference_published_pipeline.id))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If `test_dataset` is going to refresh every 4 weeks before Friday 16:00 and we want to predict every 4 weeks (forecast_horizon), we can schedule our pipeline to run every 4 weeks at 16:00 to get daily inference results. You can refresh your test dataset (a newer version will be created) periodically when new data is available (i.e. target column in test dataset would have values in the beginning as context data, and followed by NaNs to be predicted). The inference pipeline will pick up context to further improve the forecast accuracy."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# schedule\n",
        "\n",
        "from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule\n",
        "\n",
        "recurrence = ScheduleRecurrence(\n",
        "    frequency=\"Week\", interval=4, week_days=[\"Friday\"], hours=[16], minutes=[0]\n",
        ")\n",
        "\n",
        "schedule = Schedule.create(\n",
        "    workspace=ws,\n",
        "    name=\"OJ_Inference_schedule\",\n",
        "    pipeline_id=inference_published_pipeline.id,\n",
        "    experiment_name=\"Schedule-run-OJ\",\n",
        "    recurrence=recurrence,\n",
        "    wait_for_provisioning=True,\n",
        "    description=\"Schedule Run\",\n",
        ")\n",
        "\n",
        "# You may want to make sure that the schedule is provisioned properly\n",
        "# before making any further changes to the schedule\n",
        "\n",
        "print(\"Created schedule with id: {}\".format(schedule.id))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### [Optional] Disable schedule"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "schedule.disable()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "category": "tutorial",
    "celltoolbar": "Raw Cell Format",
    "compute": [
      "Remote"
    ],
    "datasets": [
      "Orange Juice Sales"
    ],
    "deployment": [
      "Azure Container Instance"
    ],
    "exclude_from_index": false,
    "framework": [
      "Azure ML AutoML"
    ],
    "friendly_name": "Forecasting orange juice sales with deployment",
    "index_order": 1,
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    },
    "tags": [
      "None"
    ],
    "task": "Forecasting",
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}